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BY ESTIMATES IN OPTIMAL TRANSPORTATION AND APPLICATIONS 


GUIDO DE PHILIPPIS, ALPAR RICHARD MESZAROS, FILIPPO SANTAMBROGIO, 

AND BOZHIDAR VELICHKOV 


Abstract. In this paper we study the BV regularity for solutions of certain variational prob¬ 
lems in Optimal Transportation. We prove that the Wasserstein projection of a measure with 
BV density on the set of measures with density bounded by a given BV function / is of bounded 
variation as well and we also provide a precise estimate of its BV norm. Of particular interest is 
the case f = 1, corresponding to a projection onto a set of densities with an L°° bound, where 
we prove that the total variation decreases by projection. This estimate and, in particular, 
its iterations have a natural application to some evolutionary PDEs as, for example, the ones 
describing a crowd motion. In fact, as an application of our results, we obtain BV estimates for 
solutions of some non-linear parabolic PDE by means of optimal transportation techniques. We 
also establish some properties of the Wasserstein projection which are interesting in their own, 
and allow for instance to prove uniqueness of such a projection in a very general framework. 


1. Introduction 

Among variational problems involving optimal transportation and Wasserstein distances, a 
very recurrent one is the following 

(1.1) min J-W$(g,g) + TF(g), 

qgV 2(H) 2 

where F is a given functional on probability measures, r > 0 a parameter which can possibly be 
small, and g is a given probability in (the space of probability measures on Q C M. d with 

finite second moment f \x\ 2 df?(x) < +oo). This very instance of the problem is exactly the one 
we face in the time-discretization of the gradient flow of F in TA^U). where g = gZ is the measure 
at step k, and the optimal g will be the next measure g£ +1 . Under suitable assumptions, at the 
limit when r —>• 0, this sequence converges to a curve of measures which is the gradient flow of 
F (see [2, 1] for a general description of this theory). 

The same problem also appears in other frameworks as well, for fixed r. For instance in image 
processing, if F is a smoothing functional, this is a model to find a better (smoother) image g 
which is not so far from the original g (the choice of the distance Wi in this case can be justified 
by robustness arguments), see [15]. In some urban planning models (see [5, 23]) g represents 
the distribution of some resources and g that of population, which from one side is attracted by 
the resources g and on the other avoids creating zones of high density thus guaranteeing enough 
space for each individual. In this case the functional F favors diffused measures, for instance 
F(g) = f h{g{x )) dx, where h is a convex and superlinear function, which gives a higher cost to 
high densities of g. Alternatively, g could represent the distribution of population, and g that 
of services, to be chosen so that they are close enough to g but more concentrated. This effect 
can be obtained by choosing F that favors concentrated measures. 
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2 


G. DE PHILIPPIS, A. R. MESZAROS, F. SANTAMBROGIO, AND B. VELICHKOV 


When F takes only the values 0 and +oo, (1.1) becomes a projection problem. Recently, the 
projection onto the set 1 K\ of densities bounded above by the constant 1 has received lot of 
attention. This is mainly due to its applications in the time-discretization of evolution problems 
with density constraints typically associated to crowd motion. For a precise description of the 
associated model we refer to [22, 16], where a crowd is described as a population of particles 
which cannot overlap, and cannot go beyond a certain threshold density. 

In this paper we concentrate on the case where F(g) = f h(g) for a convex integrand h : 
R + —> MU {Too}. The case of the projection on K\ is obtained by taking the following function: 


0, if 0 < g < 1 

Too, if g > 1, 

We are interested in the estimates on the minimizer g of (1.1). In general then can be divided 
into two categories: the ones which are independent of g (but depend on r) and the ones uniform 
in r (dependent on g). A typical example of the first type of estimate can be obtained by writing 
down the optimality conditions for (1.1). In the case F(g) = f h(g), we get p T rh'(g ) = const, 
where (p is the Kantorovich potential in the transport from g to g (in fact this equality holds only 
g— a.e., but we skip the details and just recall the heuristic argument). On a bounded domain, 
p is Lipschitz continuous with a universal Lipschitz constant depending only on the domain, 
and so is rh'{g). If h is strictly convex and C 1 , then we can deduce the Lipschitz continuity for 
g. The bounds on the Lipschitz constant of g do not really depend on g, but on the other hand 
they clearly degenerate as r —> 0. Another bound that one can prove is ||^||l°° < ||< 7 ||l°° ( see 
[7, 23]), which, on the contrary, is independent of r. 

In this paper we are mainly concerned with BV estimates. As we expect uniform bounds, in 
what follows we get rid of the parameter r. 

We recall that for every function g E L 1 and every open set A the total variation of in A 
is defined as 

TV(g, A) = J^\V g\ = sup j J £>div£ dx : £ G C].{A), |£| < 1 

Our main theorem reads as follows: 




Theorem 1.1. Let 12 C R d be a (possibly unbounded) convex set, h : R + —> M U {Too} be 
a convex and l.s.c. function and g € VgiH) 0 BV(Ll). If g is a minimizer of the following 
variational problem 

min -W%(g,g) + [ h{g{x))dx, 
eeP 2 (fi) 2 J n 


then 


( 1 . 2 ) 


|V,o| dx< |Vg| dx . 


As we said, this covers the case of the Wassertstein projection of g on the subset K\ of 
VgiLl) given by the measures with density less than or equal to 1. Starting from Theorem 1.1 
and constructing an appropriate approximating sequence of functionals we are actually able 


mere and in the sequel we denote by I\f the set of absolutely continuous measure with density bounded by /: 


K f := { e G 7>(fi) : g < fdx} 
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to establish BV bounds for more general Wasserstein projections related to a prescribed BV 
function /. More precisely we have the following result. 


Theorem 1.2. Let fl C M cZ be a (possibly unbounded) convex set, g € V 2 (Ll) D BV(Ll) and let 
f € BV\ oc(fl) be a function with 


f dx > 1. 


If 

(1.3) 
then 

(1.4) 


g = argminjivf^,#) : g G 7 7 2 (^), £</a.e.j, 


/n 


|Vf?|dx< / |Vg|dx + 2 / |V/|dx. 


/n 


We would like to spend some words on the BV estimate for the projection on the set K\, 
which is the original motivation for this paper. We note that this corresponds to the case 


h{g) 


0, iff? €[0,1], 

+oo, if g > 1, 


in Theorem 1.1 and to the case / 


1 in Theorem 1.2. In both cases we obtain that (1.2) holds. 



In dimension one the estimate (1.2) can 
be obtained by some direct considerations. 
In fact, by [13] we have that the constraint 
g < 1 is saturated, i.e. the projection is of 
the form 


p(x) 


1, if x G A, 
g(x), ifx£A, 


for an open set A C M. Since we are in 
dimension one, A is a union of intervals 
and so it is sufficient to show that (1.2) 
holds in the case that A is just one inter¬ 
val, as in the picture on the left. In this 
case it is immediate to check that the total 
variation of g has not increased after the 
projection since g = 1 on A, while there 
is necessarily a point xo € A such that 
o) > 1 - 


In dimension d > 2 the estimate (1.2) is more involved essentially due to the fact that the 
projection tends to spread in all directions. This geometric phenomenon can be illustrated 
with the following simple example. Consider the function g = (1 + e)l_B(o,_R)j where e > 0 and 
R > 0 are such that (1 + e)|-B(0, R)\ = 1. By the saturation of the constraint and symmetry 
considerations the projection g of g is the characteristic function g = t B ( 0 r)> where R = 

(1 + e) l / d R. The total variation involves two opposite effects: the perimeter of the ball increases, 
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but the height of the jump passes from 1 + e to 1. In fact we have 

[ \Vg\=dw d R d ~ 1 =du d R d - 1 (l + £) {d ~ 1 ^ d <du d R d ~ 1 (l + £)= [ |V<?|. 

JR d J 

Further explicit examples are difficult to construct. Even in the case g = (1 + e)ln, where 12 
is a union of balls, it is not trivial to compute the BV norm of the projection, which is the 
characteristic function of a union of (overlapping) balls. 

The BV estimates are useful when the projection is treated as one time-step of a discretized 
evolution process. For instance, a BV bound allows to transform weak convergence in the sense 
of measures into strong L 1 convergence (see Section 6.3). Also, if we consider a PDE mixing a 
smooth evolution, such as the Fokker-Planck evolution, and some projection steps (in order to 
impose a density constraint, as in crowd motion issues), one could wonder which bounds on the 
regularity of the solution are preserved in time. From the fact that the discontinuities in the 
projected measure destroy any kind of W 1,p norm, it is natural to look for BV bounds. Notice 
by the way that, for these kind of applications, proving f n |V^| < f n |Vg| (with no multiplicative 
coefficient nor additional term) is crucial in order to iterate this estimate at every step. 

The paper is structured as follows: In Section 2 we recall some preliminary results in optimal 
transportation, in Section 3 we establish our main inequality , in Section 4 we prove Theorem 
1.1 while in Section 5 we collect some properties of solution of (1.3) which can be interesting in 
their own and we we prove Theorem 1.2. Eventually, in Section 6 we present some applications 
of the above results, connections with other variational and evolution problems and some open 
questions. 

Acknowledgments The authors would like to thank te referee for a careful reading of the 
manuscript and for her/his comments. The second and third author gratefully acknowledge the 
support of the ANR project ANR-12-MONU-0013 ISOTACE. 


2. Notations and preliminaries 

In this section we collect some facts about optimal transport that we will need in the sequel, 
referring the reader to [25] for more details. We will denote by R(12) the set of probability 
measures in 12 and by V 2 (Ll) the subset of V(Ll) given by those with finite second moment (i.e. 
/r € ^(H) if and only if f |x| 2 d/U < 00 ). We will also use the spaces A4(12) of finite measures 
on 12 and T/_(12) of non-negative functions in L 1 . Notice {/ € : f f(x)dx = 1} = 

L[/(12) nP(fl). In the sequel we will always identify an absolutely continuous measure with its 
density (for instance writing T#f for T#(/ dx) and so on..). 

Theorem 2.1. Let 12 C M. d be a given convex set and let g, g £ T+(!2) be two probability densities 
on 12. Then the following hold: 

(i) The problem 

( 2 - 1 ) i}W2(g,g) := mini f ^-\x - y\ 2 d-f : 7 €ll( 0 ,s)j, 

where II(^, g) is the set of transport plans, i.e. II(^dx, gdx) := {7 € 7 ? (12xl2) : (n x )^ 1 'y = 
g, ( 7 r y )#7 = g }, has a unique solution, which is of the form 7 ^, := (id,T)#g, and T : 
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12 —> 12 is a solution of the problem 


( 2 . 2 ) 


1 


mm 


T#e=gJn 2 


-\x — T{x)\ g(x) dx. 


(ii) The map T : > 0} —> {g > 0} is a.e. invertible and its inverse S := T 1 is a solution 

of the problem 


(2.3) 


min [ -\x — S(x)\ 2 q(x) dx. 

s#g=eJn 2 1 Kn yK J 


(iii) WjjOj •) is a distance on the space V 2 (12) of probabilities overTl with finite second moment. 

(iv) We have 
(2.4) 

^W$(g,g) = max^J^p(x)g(x)dx + J^if(y)g(y)dy : p(x) + if(y) < -\x - y\ 2 , Vx,yG^|. 

(v) The optimal functions p, if in (2.4) are continuous, differentiable almost everywhere, 
Lipschitz if 12 is bounded, and such that: 

• T(x) = x — Vp(x) and S(x) = x — Vif(x) for a.e. x E 12; in particidar, the 
gradients of the optimal functions are uniquely determined (even in case of non¬ 
uniqueness of p and if) a.e. on > 0} and {g > 0}, respectively; 

• the functions 

\x\ 2 \x\ 2 

x i-A —- p{x) and x i-A —--?/>(x), 

are convex in 12 and hence p and if are semi-concave; 


p(x) = min {\\x - y\ 2 - if(y) 
yell I z 


and if(y) = min -\x - y\ 2 - p(x) \ ; 

xeii 1 z 


if we denote by \ c the c—transform of a function % : 12 —>• R defined through x c (y) = 
inf xe n \ \x — y\ 2 — x( x )> then the maximal value in (2.4) is also equal to 


(2.5) 


max 


p(x)g(x) dx + [ p c (y)g(y)dy , p € C°(12) 
Jo. 


and the optimal p is the same p as above, and is such that p = ( (p c ) c a.e. on 
{^> 0 }. 

(vi) If g £ 7^2 (12) is given, the functional W : 7^2(12) —> M defined through 

W(g) = ^-W%(g,g)=max{ [ p(x)g(x)dx+ [ p c (y)g(y)dy, p€C°( 12) 
z Un Jn 

is convex. Moreover, if {g > 0} is a connected open set we can choose a particular 
potential p, defined as 

p{x) = inf I ^\x - y\ 2 - if(y) : y € spt(g) 1 , 


where if is the unique (up to additive constants) optimal function if in (2.4) (i.e. p is 
the c— transform of if computed on 12 x spt (g)). With this choice, if x = 8 ~ Q ts the 
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difference between two probability measures, then we have 


Um W(g + e X )-W( g ) = 
£—>-0 £ 

As a consequence, fi is the first variation ofW. 



fid'X- 


The only non-standard point is the last one (the computation of the first variation of W): it 
is sketched in [5], and a more detailed presentation will be part of [24] (Section 7.2). Uniqueness 
of on spt(g) is obtained from the uniqueness of its gradient and the connectedness of {g > 0}. 


We also need some regularity results on optimal transport maps, see [8, 9]. 

Theorem 2.2. Let Q C M. d be a bounded uniformly convex set with smooth boundary and let 
g, g G fl) be two probability densities on U away from zero and infinity 2 . Then, using the 
notations from Theorem 2.1, we have: 

(i) f € C°’“(U) and S G C 0 -“(TT). 

(ii) Ifge C k ’P(Ti) and g G C k ^(Jl), then f G C k+1 ^(Q) and S G C k+1 ^(Q). 

Most of our proofs will be done by approximation. To do this, we need a stability result 

Theorem 2.3. Let U C M. d be a bounded convex set and let g n G L+(fl) and g n G L\(Ll) be two 
sequences of probability densities in U. Then, using the notations from Theorem 2.1, if g n —*■ g 
and g n —*• g weakly as measures, then we have: 

(i) W 2 (g,g) = lim n— >oo 2 ( gn i 9n ) • 

(ii) there exist two semi-concave functions T,fi such that Vfi n -A V</? and Vfi n -X 
Vfi a.e. and V</? = Vfi a.e. on > 0} and Vfi = a.e. on {g > 0}. 

If U is unbounded (for instance U = M. d ), then the convergence g n —^ g and g n —*■ g weakly as 
measures is not enough to guarantee (i) but only implies W 2 (g,g) < limin^^o 0 W 2 (g n ,g n ). Yet, 
(i) is satisfied if W 2 (g n , g),W 2 (g n , g) —> 0, which is a stronger condition. 

Proof. The proof of (i) can be found in [25]. We prove (ii). (Actually this is a consequence of 
the Theorem 3.3.3. from [10], but for the sake of completeness we sketch its simple proof). 

We first note that due to Theorem 2.1 (v) the sequences <p n and fi> n are equi-continuous. 
Moreover, since the Kantorovich potentials are uniquely determined up to a constant we may 
suppose that there is xq G such that fi n (x o) = if n (x o) = 0 for every n G N. Thus, (p n and ijj n 
are locally uniformly bounded in U and, by the Ascoli-Arzela Theorem, they converge uniformly 
up to a subsequence 

fin - > Too and fin - > fioo, 

n— >oo n —^oo 

to some continuous functions Too, fioo £ C'(Jl), satisfying 

Too(x) +fioo{y) < \\x~y\ 2 , for every x,yeQ. 

In order to show that Too and fi <*, are precisely Kantorovich potentials, we use the charac¬ 
terization of the potentials as solutions to the problem (2.4). Indeed, let t and fi be such that 

2 We say that g and g are away from zero and infinity if there is some e > 0 such that £ < g < 1 /e and 
s < g < 1 /e a.e. in fl. 
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/ y{x)Q n {x) dx+ / ip(y)g n (y)dy, 
J Q J Q 


tp(x) + ip(y) < \\x — y | 2 for every Then, for every n E N we have 

/ ^„(a:)ft ! (x)dx+ / 4> n (y)gn(y) dy > 
in in 

and passing to the limit we obtain 

/ + / ^j 00 {y)g{y) dy > / (p(x)g(x) dx + / t%)y(y)dy, 

«/ f2 </ r2 «/ r2 

which proves that </?oo and V’oo are optimal. In particular, the gradient of these functions coincide 
with those of (p and if on the sets where the densities are strictly positive. 

We now prove that X(p n —> V<^oo a.e. in Q. We denote with JV C II the set of points x E fl, 
such that there is a function among <p and <p n , for n E N, which is not differentiable at x. We 
note that by Theorem 2.1 (v) the set A f has Lebesgue measure zero. Let now xo E Q \ AT and 
suppose, without loss of generality, xq = 0. Setting 

I 1 2 | 12 

a n (x) := —- <p n (x) + + X ■ V^oo(O) and a(x) := — -^oo (x) + ipoo (0) + x ■ V^oo(0), 

we have that a n are all convex and such that a n (0) = 0, and hence a n (x) > Va n (0)-x. Moreover, 
a n —> a locally uniformly and Va(0) = 0. Suppose by contradiction that lim^-nx, Va„(0) / 0. 
Then, there is a unit vector p E and a constant 6 > 0 such that, up to a subsequence, 
p ■ V a n > 5 for every n > 0. Then, for every t > 0 we have 

= i im > liminf {p ■ Va n (0)} > <5, 

t n—xx> t ~ n—>oo ~ ’ 

which is a contradiction with the fact that Va(0) = 0. □ 


In order to handle our approximation procedures, we also need to spend some words on the 
notion of T — convergence (see [11]). 

Definition 2.1. On a metric space X let F n : X U {+oo} be a sequence of functions. We 
define the two lower-semicontinuous functions F~ and F + (called T — lim inf and T — lim sup of 
this sequence, respectively) by 

F~(x) := inf{liminf F n (x n ) : x n —» x}, 

n—t oo 

F + (x) := inf {lim sup F n (x n ) : x n ^-x}. 

n—>oo 

Should F~ and F + coincide, then we say that F n actually T—converges to the common value 
F = F~ = F+. 


This means that, when one wants to prove T—convergence of F n towards a given functional 
F, one has actually to prove two distinct facts: first we need F~ > F (this is called T—liminf 
inequality, i.e. we need to prove lirri inf n F n (x n ) > F(x) for any approximating sequence x n -A x) 
and then F + < F (this is called T—limsup inequality, i.e. we need to find a recovery sequence 
x n —>• x such that limsup n F n (x n ) < F(x)). 

The definition of T—convergence for a continuous parameter e —>• 0 obviously passes through 
the convergence to the same limit for any subsequence e n —> 0. 

Among the properties of T—convergence we have the following: 

• if there exists a compact set K c X such that inf.Y F n = inf/^ F n for any n, then F 
attains its infimum and inf F n —> min F. 
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• if (x n ) n is a sequence of minimizers for F n admitting a subsequence converging to x, 
then x minimizes F (in particular, if F has a unique minimizer x and the sequence of 
minimizers x n is compact, then x n —> x), 

• if F n is a sequence T—converging to F, then F n + G will T—converge to F + G for any 
continuous function G : X — > M U {+oo}. 

In the sequel we will need the following two easy criteria to guarantee T—convergence. 

Proposition 2.4. If each F n is l.s.c. and F n — >• F uniformly, then F n T—converges to F. 

If each F n is l.s.c., F n < F n+ \ and F{x) = lim n F n (x) for all x, then F n T—converges to F. 

We will essentially apply the notion of T —convergence in the space X = V(Ll) endowed with 
the weak convergence 3 (which is indeed metrizable on this bounded subset of the Banach space 
of measures) since if, instead, we endowed the space V 2 (fi) with the W 2 convergence, then we 
would lack compactness whenever Ll is not compact itself. 


We conclude this section with the following simple lemma concerning properties of the func¬ 
tional 

if g <C dec, 

+ 00 , otherwise. 


M(n)3 S ^H( Q ) = F^ e{x))ix ' 


Lemma 2.5. Let f l be an open set and h : R —> M U {+ 00 } be convex, l.s.c. and superlinear 
at + 00 , then the functional H : M{Ll) ->Kll {+ 00 } is convex and lower semicontinuous with 
respect to the weak convergence of measures. Moreover if h £ C 1 then we have 

H(q + ex) ~ H(q) 


lim 

£—^0 


= / ti(g)dx 


whenever p, x dec, H(g) < +00 and H(g + ex) < +00 at least for small e. ^4s a consequence, 
h'(g) is the first variation of H. 

For this classical fact, and in particular for the semicontinuity, we refer to [4] and [3]. 

We also use this lemma, together with point (vi) in Theorem 2.1 to deduce the following 
optimality conditions. 

Corollary 2.6. Let LI be a bounded open set, g £ L+(fl) an absolutely continuous and strictly 
positive probability density on LI, the potential dp and the functional W defined as in point (vi) 
in Theorem 2.1. Let h : R —>• M be a C 1 convex and superlinear function, and let H : Ai(Ll) —> 
M U {+ 00 } be defined as above. Suppose that g solves the minimization problem 

min {W{g) + H(g) : g £ V(Ll)}. 

Then there exists a constant C such that 

h!(g) = max{(C — dp ), h'( 0)}. 

The proof of this fact is contained in [5] and in Section 7.2.3 of [24]. We give a sketch here. 

Proof. Take an arbitrary competitor g, define g £ := (1 — e)g + eg and X = g — g and write the 
optimality condition 

(■ H + W)(g + e X )-(H + W)(g ) 


0 < lim 

£—^0 


£ 


3 We say that a family of probability measure p n weakly converges to a probability measure p, in if f ip dpn 
f tp dp for all ip € Cbifl), where Cb(fil) is the space of continuous and bounded functions on O. 
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This implies 

J (<P + ti (q)) d§ > J(fi + ti(g))dg 

for any arbitrary competitor g. This means that there is a constant C such that (p + h'(g) > C 
with <p + h'(g) = C on > 0}. The claim is just a re-writing of this fact, distinguishing the set 
where g > 0 (and hence h'(g) > h'(0)) and the set where g = 0 . □ 


3. The main inequality 


In this section we establish the key inequality needed in the proof of Theorems 1.1 and 1.2. 

Lemma 3.1. Suppose that g,g € L\ are smooth probability densities, which are bounded away 
from 0 and infinity, O C l (i a bounded and uniformly convex domain and let H € C' 2 (0) be a 
convex function. Then we have the following inequality 

(3.1) J (e V • [VHiVip)} - <7 V ■ [VH(-Vif )]) dx < 0 , 

where (ip, if) is a choice of Kantorovich potentials. 


Proof. We first note that since g and g are smooth and away from zero and infinity in 17, Theorem 
2.2 implies that ip, if are smooth as well. Now using the identity S(T(x )) = x and that S#g = g 
we get 

[ g(x) V • [VH(Vip(x))} dx = [ g{x) [v • [Vtf(V^)]l (S(x)) dx 

= [ 5 (x)V ■ [vtf(V^oS)](x)dx 

Jn L - 1 

J g(x) ([V • [Vif(Vp)]] (S{x)) - V • [\7H(Kp o 5)] (x)) dx, 


+ 


and, by the equality 


-Vif(x) = S(s) - x = 5(s) - T(S(x)) = V<p(S(x)), 


we obtain 


6 V • [VH (V^)] - g V • [VH(-Vif)]) dx = 


(3.2) 


= J g(x)( V- [VH(V<p)] (S(x)) — V • [Vif(V^o5)](x)) dx 


For simplicity we set 


(3.3) 


= J g(x) (V ■ [VH{Vp)} - V • [VH(V<p) o S] of) dx. 


E = V • (VH(V<p)) — [V • (VH(V(p) o,S)] oT 


= V-C- [V-(^oS)] oT, 
where by £ we denote the continuously differentiable function 

£(x) = (£\...,£ d ) := V-ff(V<^(x)), 
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whose derivative is given by 

Df = D(VH(Vtp)) = D 2 H(Vip) ■ D 2 (p. 

We now calculate 


(3.4) 


[v-«os)]. r 


i =1 


*=1 j =1 


dx i 


= tr (D£ ■ (DT )" 1 ) = tr (D 2 H(V<p) • D 2 ip • (I d - D 2 ?)- 1 ) , 

where the last two equality follow by DS o T = (DT ) _1 and we also used that ( DT ) _1 
(Id — D 2 (p)~ 1 , where I d is the d-dimensional identity matrix. 

By (3.3) and (3.4) we have that 

E = tr [D 2 H(V<p) ■ D 2 ^ • (I d - (I d - D 2 ?)- 1 ) ] 

= -tr [D 2 H(V<p) • [D 2 <p] 2 -(I d - D 2 ?)- 1 ]. 


Since we have that 

I d - D 2 ip > 0, 


and that the trace of the product of two positive matrices is positive, we obtain E < 0, which 
together with (3.2) concludes the proof. □ 


Lemma 3.2. Let Q C W l be bounded and convex, g,g £ W 1,:L (fi) be two probability densities 
and H G C 2 (M d ) be a radially symmetric convex function. Then the following inequality holds 


(3.5) 


J (Vg ■ VH(Vtp) + Vg ■ Vtf(W>)) dx > 0, 


where (<p, if) is a choice of Kantorovich potentials. 


Proof. Let us start observing that, due to the radial symmetry of H, 
(3.6) VH(Shf) = -VH(-Vif). 


Step 1. Proof in the smooth case. Suppose that the probability densities g and g are smooth 
and bounded away from zero and infinity and that Q is uniformly convex. As in Lemma 3.1, 
we note that under these assumption on g and g the Kantorovich potentials are smooth, hence 
after integration by part the left hand side of (3.5) becomes 


J (Vg ■ VH(V<p) + Vg ■ Vtf(V^)) dx 



(g VH(Vip) ■n + g VH(Vif) ■ nj dU d ~ l 


gV • [Vff(VvO] +9 V ■ [VH(Vif)\ ) dx 
> J (gVH(Vip) +gVH(Wif)^j -nd^" 1 , 

where we used Lemma 3.1 and (3.6). Moreover, by the radial symmetry of H one has VH(z) = 
c(z)z, for some c(z) > 0. Since the gradients of the Kantorovich potentials V<£ and Vif calculated 
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in boundary points are pointing outward (since T(x) = x — Vip(x) E hi, and S(x) = x — 
Vi/j(x) E Q) we have that 

V H (V <p(x)) ■ n(x) > 0 and VH(Vif(x)) ■ n[x) >0, Vx E dU, 
which concludes the proof of (3.5) if g and g are smooth. 


Step 2. Withdrawing smoothness and uniform convexity assumptions. We first note that for 
every e > 0 there is a sequence of uniformly convex domains Q e such that fl C C IT (where Q' 
is a larger fixed convex domain) and |f2 e \ 121 —>• 0, together with smooth nonnegative functions 
g £ E C 1 (fl / ) and g £ E C 1 (f^ / ) such that 


Qe -—t g and g e -- > g. 

£—>-0 £—^0 


We will suppose that both g £ and g £ are probability densities on f l £ . Moreover, by adding 
a positive constant and then multiplying by another one, we may assume that g £ and g £ are 
probability densities away from zero: 


Be > £, g £ > £ and 



/ g £ dx=l. 
J 


Let ip £ E C' 2,/3 (n e ) and V’e € C 2,l3 (Q £ ) be the Kantorovich potentials corresponding to the optimal 
transport maps between g £ and g £ . By Step 1 we have 


(3.7) / (V& • VLT(V^) + Vg £ ■ Vi7(V^)) dx > 0. 

Note that from the boundedness of fL we infer | Vy? e |, | V^ e | < C. Moreover, VH is locally 
bounded, which also implies |Vi7(V</? £ )|, |V77(V'0 e )| < C. On the other hand, from |fi e \fi| 0, 
supposing that the convergence V g £ —>• V g and \7g e —> Vg holds a.e. and is dominated, when 
we pass to the limit as e —> 0 the integral restricted to £l e \Q is negligible. On O we use Theorem 
2.3, the bounds on |Vi7(V</? e )|, |Vi7(V , i/> £ )| and 


V<y9 e V<p and Vip £ Vip. 

£—^0 £—^0 

Passing to the limit as £ —)• 0 in (3.7) we obtain (3.5), which concludes the proof. □ 


Remark 3.1. In Lemma 3.2 we can drop the convexity assumption on Q if g. g have compact 
support: indeed, it is enough to choose a ball D containing the supports of g and g. 


Remark 3.2. Lemma 3.2 also remains true in the case of compactly supported densities g and 
g, even if we drop the radiality assumption H(z) = H(\z\). In this case the inequality becomes 

J (Vg ■ V77(V<^) - Vg ■ Vif(-V^)) dx > 0. 

Proof. The proof follows the same scheme of that of Lemma 3.2, first in the smooth case and 
then for approximation. We select a convex domain 17 large enough to contain the supports of g 
and g in its interior: all the integrations and integration by parts are performed on fI. The only 
difficulty is that we cannot guarantee the boundary term to be positive. Yet, we first take g, g 
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to be smooth and we approximate them by taking g £ := e-p + (1 — e)g and g £ := e-p + (1 — e)g. 
For these densities and their corresponding potentials p £ ,if £ , we obtain the inequality 

J (Vg £ ■ VH(Vip £ ) + Vg £ ■ VH(Vfa)) dx > J VF(V%) + g £ VF(Vi)) -nd^" 1 . 

We can pass to the limit (by dominated convergence as before) in this inequality, and notice 
that the r.h.s. tends to 0, since | S/H(S/p £ )\, |V H (VV’e)l < C and g £ = g £ = e/|fi| on d£l. Once 
the inequality is proven for smooth g,g, a new approximation gives the desired result. □ 

We observe that a particular case of Theorem 3.2, which we present here as a corollary, could 
have been obtained in a very different way. 

Corollary 3.3. Let fid d be a given bounded convex set and g, g € W 1 ’ 1 (f2) be two probability 
densities. Then the following inequality holds 


(3.8) 


(S7 g ■ S/p T S/g ■ S7fj) dx > 0, 


where tp and if are the corresponding Kantorovich potentials. 

Proof. The inequality (3.8) follows by setting H(z) := -\z\ 2 in Theorem 3.2. Nevertheless, in 

this particular case, there is an alternate proof, using the geodesic convexity of the entropy 
functional, which we sketch below for Ll = M d . 

Consider the entropy functional £ : 7^2 (®^) —> M defined by 


£(q) = 


/ ^logpdx, if g < £ , 

JR d 

Too, otherwise, 

and the geodesic 

[0,1] 3 t i-A g t £ 7^*2(M rf ), £o = g, Qi = g, 

in the Wasserstein space (T^O^); W^)- It is we ll known (see, for example, [2]) that the map 
t £{gt) is convex and that gt solves the continuity equation 

dtQt T V • (g t v t ) = 0, g 0 = g, g\ = g , 

associated to the vector field vt = (T — id) o ((1 — t)id + tT ) _1 induced by the optimal transport 
map T = id —\7p between g and g. Now since the time derivative of £ (gt) is increasing, we get 


— / Vf? • Vp dx = 


f S/g 

/ QV o- 

J R d 8 


dx = A 

dt 
d 


< 


dt 


{*= 0 } 

{*=1} 


£(et 


£(et) = [ g y i • — dx 
jR d 9 


= / S/g-S/' 0dx, 


which proves the claim. 


□ 


By approximating H(z) = \z\ with H(z) = \Je 1 T \z\ 2 , Lemma 3.2 has the following useful 
corollary, where we use the convention -—- = 0 for z = 0. 
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Corollary 3.4. Let Q C W l be a given bounded convex set and g,g £ VF 1,1 (17) be two probability 
densities. Then the following inequality holds 


(3.9) 


_ V ip Vif \ 


dx > 0, 


where g and if are the corresponding Kantorovich potentials. 


4. BV ESTIMATES FOR MINIMIZERS 

In this section we prove Theorem 1.1. Since we will need to perform several approximation 
arguments, and we want to use T—convergence, we need to provide uniqueness of the minimizers. 
The following easy lemma is well-known among specialists. 

Lemma 4.1. Let g £ 'P(fl) n lA(fl) ; then the functional g H > Wf(g,g) is strictly convex on 

v 2 (n). 

Proof. Suppose by contradiction that there exist fiQ g\ and t e]0, 1[ are such that 

W 2 (g t ,g) = (l-t)W 2 (g 0 ,g) + tW 2 (g 1 ,g), 

where gt = (1 — t)g o + tg\. Let 70 be the optimal transport plan in the transport from go to g 
(pay attention to the direction: it is a transport map if we see it backward: from g to go). As the 
starting measure is absolutely continuous, by Brenier’s Theorem, 70 is of the form (Tq, id)#g. 
Analogously, take 71 = (' Ti,id)#g optimal from gi to g. Set 7 * := (1 — t) 7 o + £71 £ U(g t ,g). We 
have 

(l-t)W$(g 0 ,g)+tW$(g 1 ,g) = W 2 {g t ,g) < j \x-y\ 2 d~f t = (1-f) j \x-y\ 2 d^o+t j |x-y| 2 d 7 i 

= (1 - £)IT 2 2 (^o, 5 ) +tW 2 {g ll g), 

which implies that 7 1 is actually optimal in the transport from g to gt- Yet 7 1 is not induced 
from a transport map, unless T 0 = Ti a.e. on {g > 0}. This is a contradiction with g 0 7 ^ g\ and 
proves strict convexity. □ 

Let us denote by C the class of convex l.s.c. function h : M + —> R U {+ 00 }, finite in a 
neighborhood of 0 and with finite right derivative ^(0) at 0, and super linear at + 00 . 

Lemma 4.2. If h € C there exists a sequence of C 2 convex functions h n , superlinear at 00 , with 
h" > 0, h n < h n+ 1 and h{x) = lim n h n (x) for every x £ R + . 

Moreover, if h : M + -> III {+ 00 } is a convex l.s.c. superlinear function, there exists a 
sequence of functions h n £ C with h n < h n+ 1 and h(x) = lim n h n {x) for every x £ R + . 

Proof. Let us start from the case h £ C. Set i + := supjx : h(x) < + 00 } £ R + U {+cxd}. Let us 
define an increasing function : R —> R in the following way: 


'h'( 0) 

for x £]— 00 ,0] 

h\x) 

for x £ [0,^ + - i] 

h’(C - I) 

for I + — - < x < I + , 

k h\l + — 7) + n(x — £ + ) 

for x > £ + , 


where, if the derivative of h does not exist somewhere, we just replace it with the right derivative. 
(Notice that when I + = + 00 , the last two cases do not apply). 
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Let q > 0 be a C 1 function with spt(g) C [—1, 0], f q(t) dt = 1 and let us set q n (t ) = nq(nt). 
We define h n as the primitive of the C 1 function 

h'n(x) := j (^n{t) - ~ x) dt, 

with h n (0) = h( 0). It is easy to check that all the required properties are satisfied: we have 
hn(x) > n e_X ’ is superlinear because lirn^oo (x) = +oo, and we have increasing conver¬ 
gence h n — )■ h. 

For the case of a generic function h, it is possible to approximate it with functions in C if we 
define := inf{x : h(x) < +oo} £ M + and take 


h n (x ) 


'h(r + ±) + h'(e- + ±)(x-e- 
< h(£- + i i ) + h'(e- + i i )(x-e- 

M x ) 


i) + n\x-l 


for x < £ 

for x ^\£~,t~ + i] 
for x > t~ + 


In this case as well, it is easy to check that all the required properties are satisfied. □ 


Proof of Theorem 1.1. 


Proof. Let us start from the case where g is W 1,1 and bounded from below, and h is C 2 , super- 
linear, with h" > 0, and is a bounded convex set. A minimizer g exists (by the compactness 
of V 2 (fl) and by the lower semicontinuity of the functional with respect to the weak convergence 
of measures). Thanks to Corollary 2.6, there exists a Kantorovich potential (/? for the transport 
from g to g such that h'{g) = maxIC* — ip, /^(O)}. This shows that h'(g) is Lipschitz continuous. 
Hence, g is bounded. On bounded sets h! is a diffeomorphism with Lipschitz inverse, thanks to 
h" > 0, which proves that g itself is Lipschitz. Then we can apply Corollary 3.4 and get 


V ip 

Vg ■ 


|Vy>| 


+ V5 • 




dx > 0. 


Yet, a.e. on {Vf? / 0} we have from h'(g) = C — ip. Using also h" > 0, we get that \7p and Vg 
are vectors with opposite directions. Hence we have 


dx< 

Jn 

which is the desired estimate. 

We can generalize to h £ C by using the previous lemma and approximating it with a sequence 
h n . Thanks to monotone convergence we have T—convergence for the minimization problem that 
we consider. We also have compactness since is compact, and uniqueness of the minimizer. 

Hence, the minimizers g n corresponding to h n satisfy |V^ n | < |Vg| and converge to the 
minimizer g corresponding to h. By the semicontinuity of the total variation we conclude the 
proof in this case. 

Similarly, we can generalize to other convex functions h, approximating them with functions 
in C (notice that this is only interesting if the function h allows the existence of at least a 
probability density with finite cost, i.e. if /i(l/|H|) < + 00 ). Also, we can take g £ BV and 
approximate it with W 1,1 functions bounded from below. If the approximation is done for 
instance by convolution, then we have a sequence with W 2 (g n ,s) —>• 0, which guarantees uniform 
convergence of the functionals, and hence T—convergence. 

We can also handle the case where H is unbounded and convex, by first taking g to be such 
that its support is a convex bounded set, and h £ C. In this case the optimal g must be compactly 
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supported as well. Indeed, the optimality condition h'(g) = maxIC — <p,h'( 0)} imposes g = 0 
on the set where p > C — h!( 0). But on {g > 0} we have p = where ^ is the Kantorovich 
potential defined on spt(g), which is bounded. Hence p grows at infinity quadratically, from 
p(x) = inf yespt ( , i|x — y | 2 — ip(y), which implies that there is no point x with g(x) > 0 too far 
from spt(g). Once we know that the densities are compactly supported, the same arguments 
as above apply (note that being H convex we ca assume that the densities are supported on a 
bounded convex set). Then one passes to the limit obtaining the result for any generic convex 
function h, and then we can also approximate g (as above, we select a sequence g n of compactly 
supported densities converging to g in W 2 ). Notice that in this case the convergence is no more 
uniform on T-^C^), but it is uniform on a bounded set g) < C which is the only one 

interesting in the minimization. □ 


5. Projected measures under density constraints 


5.1. Existence, uniqueness, characterization, stability of the projected measure. In 

this section we will take H C M. d be a given closed set with negligible boundary, f : Q [0, + 00 [ 
a measurable function in L|' oc (Q) with J (} f dx > 1 and g € T > 2 (ty a given probability measure 
on 0. We will consider the following projection problem 

(5.1) min W$(g,g), 


where we set Kf = {^ € L^(f2) : gdx = 1, g < /}. 

This section is devoted to the study of the above projection problem. We first want to 
summarize the main known results. Most of these results are only available in the case / = 1. 

Existence. The existence of a solution to Problem (5.1) is a consequence of the direct 
method of calculus of variations. Indeed, take a minimizing sequence g n ; it is tight thanks to 
the bound W 2 (g n ,g) < C\ it admits a weakly converging subsequence and the limit minimizes 
the functional W 2 (-,g) because of its semicontinuity and of the fact that the inequality g < / is 
preserved. We note that from the existence point of view, the case / = 1 and the general case 
do not show any significant difference. 

Characterization. The optimality conditions, derived in [22] exploiting the strategy devel¬ 
oped in [16] (in the case / = 1, but they are easy to adapt to the general case) state the following: 
if g is a solution to the above problem and p is a Kantorovich potential in the transport from g 
to //, then there exists a threshold <£R such that 


g(x ) 


7(z), 

< 0, 

[ 0 ,/(*)], 


if p>(x) < £, 
if cp(x) > £, 
if <p(x) = £. 


In particular, this shows that V<£> = 0 g— a.e. on {g < /} and, since T(x ) = x — S/p(x), that the 
optimal transport T from g to g is the identity on such set. If g = gdx is absolutely continuous, 
then one can write the Monge-Ampere equation 


det(DT(x)) = g{x)/g(T{x)) 

and deduce g(x) = g(T(x)) = g{x) a.e. on < /}. This suggests a sort of saturation result for 
the optimal g, i.e. g(x) is either equal to g(x) or to f(x) (but one has to pay attention to the 
case ,o = 0 and also to assume that g is absolutely continuous). 
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Uniqueness. For absolutely continuous measures g = gdx and generic / the uniqueness of 
the projection follows by Lemma 4.1. In the specific case / = 1 and Ll convex the uniqueness 
was proved in [16, 22] by a completely different method. In this case, as observed by A. Figalli, 
one can use displacement convexity along generalized geodesics. This means that if g° and g 1 
are two solutions, one can take for every t € [0,1] the convex combination T 4 = (1 — t)T° + tT 1 
of the optimal transport maps T i from g to g 1 and the curve t i—>• g l := ((1 — t)T° + tT l )#g in 
V‘ 2 , interpolating from g° to g 1 . It can be proven that g t still satisfies g t < 1 (but this can not 
be adapted to /, unless / is concave) and that t i-a r/) < (1 — t)W^(g°, g) + tW%(g l , g), 

which is a contradiction to the minimality. The assumption on g can be relaxed but we need 
to ensure the existence of optimal transport maps: what we need to assume, is that g gives 
no mass to “small” sets (i.e. (d — 1)—dimensional); see [14] for the sharp assumptions and 
notions about this issue. Thanks to this uniqueness result, we can define a projection operator 
P Kl : V 2 (n) n L 1 (17) -A V 2 (P) n L^fl) through 

BrAs] := argmin{w£(g,g) : g € Ki}. 

Stability. From the same displacement interpolation idea, A. Roudneff-Chupin also proved 
([ 22 ]) that the projection is Holder continuous with exponent 1/2 for the W 2 distance whenever 
H is a compact convex set. We do not develop the proof here, we just refer to Proposition 2.3.4 
of [22]. Notice that the constant in the Holder continuity depends a priori on the diameter of 
fL However, to be more precise, the following estimate is obtained (for g° and g 1 absolutely 
continuous) 

(5.2) W%(P Kl [g\P Kl [g 1 }) < W 2 V, g 1 ) + W 2 (g°, ^(dist^ 0 , K{) + distfo 1 , K{)), 

which shows that, even on unbounded domains, we have a local Holder behavior. 

In the rest of the section, we want to recover similar results in the largest possible generality, 
i.e. for general /, and without the assumptions on g and LI. 

We will first get a saturation characterization for the projections, which will allow for a general 
uniqueness result. Continuity will be an easy corollary. 

In order to proceed, we first need the following lemma. 

Lemma 5.1. Let g be a solution of the Problem 5.1. Let moreover 7 € n (g,g) be the optimal 
plan from g to g. If (a?o, yo) £ spt(y) then g = f a.e. in B(yo, R), where R = \yo — xo|. 

Proof. Let us suppose that this is not true and there exists a compact set K C B(yo,R) with 
positive Lebesgue measure such that g < f a.e. in K. Let e := dist(dB(yo, R), K) > 0. 

By the definition of the support, for all r > 0 we have that 

0 < lf{B(xo,r) X B(yo,r)) < [ gdx < [ fdx. 

JB(xo,r) J B(x o,r) 

By the absolute continuity of the integral, for r > 0 small enough there exists 0 < a < 1 such 
that 

7 (B(x 0 , r) x B(y 0 , r)) = a (/ - g) dx =: am. 

Jk 

Now we construct the following measures 7 ,iRP(Ox H) as 

7 ;= 7 — 7 l_(R(x 0 ,r) x B(y 0 ,r))+r] and 7 := a(f - g)dxLK ® (n y ) # 'y\_(B(xo, r) x B(y 0 ,r)). 
It is immediate to check that ( 7 r y )#7 = g. On the other hand 

g ■■= (7r :f ')#7 = Q~ g\-B(x 0 ,r) + a(f - g)LI< < f 
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is an admissible competitor in Problem (5.1) and we have the following 

W%(q,ij,)<[ \x — y\ 2 d 7 (x, y) 

Jfix f 2 

<Wi( Q ,g)- 


x-y\ 2 d-f(x,y)+ \x - y\ 2 dy(x, y) 

JKxB(y 0 ,r) 


’ B(x 0 ,r)xB(y 0 ,r) 

< W^ig, g) — (R — 2 r) 2 am + (R — £ + r) 2 am. 

Now if we chose r > 0 small enough to have R — 2r > R — e + r, i.e. r < e/3 we get that 

W 2 (g,g)<W 2 (g,g), 

which is clearly a contradiction, hence the result follows. 


□ 


The following proposition establishes uniqueness of the projection on Kf as well as a very 
precise description of it. For a given measure /i we are going to denote by pp c the density of its 
absolutely continuous part with respect to the Lebesgue measure, i.e. 

H = p^dx + //, 

with fi s _L dx. The following result recalls corresponding results in the partial transport problem 

([13])- 

Proposition 5.2. Let Ll C M 0 * be a convex set and let f £ L^ oc (fl), f > 0 be such that Jq f > 1 . 
Then, for every probability measure g £ there is a unique solution g of the problem (5.1). 

Moreover, g is of the form 

(5.3) g = /r ac ls + /1 r c , 

for a measurable set B C 11. 

Proof. We first note that by setting / = 0 on Q c we can assume that Ll = Existence of a 
solution in Problem 5.1 follows by the direct methods in the calculus of variations by noticing 
that the set Kf is closed with respect to the weak convergence of measures. 

Let us prove now the saturation result (5.3). Let us first premise the following fact: if 
pi, v £ V{Ll), 7 £ II {pi, v) and we define the set 

^ 4 ( 7 ) := {x £ Q : the only point ( x,y ) £ spt(y) is (x,x)}, 

then 


(5.4) gL4(7) < z/l_yl( 7 ). 

In particular pP c < i/ ac for a.e. x £ ^( 7 ). To prove (5.4), let > 0 and write 

<pdfi = J 0(x)l A(7 )(x)d7(x,y) = J (f(x)l 2 A ^(x) d-/(x,y) 

= / ^(y)lA( 7 )(?/)lA(7)(®)d7(x,j/) 


/A( 7 ) 


/' 

/ = / 4>du, 

J Jam 


where we used the fact that 7 —a.e. l A ( 7 )(x) > 0 implies x = y. 

Now, for an optimal transport plan 7 £ IT(^>, yu), let us define 

B := Leb(/) fl Leb(// ac ) n Leb(£>) D {g < f}^ n ^( 7 )^ n ^4( ; y) ( ' 1 ^. 
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Here 7 G n(<7, £>) is the transport plan obtained by seeing 7 “the other way around”, i.e. 7 is 
the image of 7 through the maps ( x , y) i-A ( y , x) while Leb (h) is the set of Lebesgue points of h 
and for a set A we denote by A W := Leb(l J 4 ) the set of its density one points. 

Let now To G B and let us consider the following two cases: 

Case 1. g(x 0 ) < /i ac (xo). Since, in particular, p ac (xo) > 0 and To G Leb(p ac ) we have that 
xq G spt(p). From Lemma 5.1 wee see that (yo,xo) G spt( 7 ) implies yo = xq. Indeed if this were 
not the case there would exist a ball where g = f a.e. and xq would be in the middle of this ball; 
from xq G Leb(/) n Leb(^) we would get g(x 0 ) = /(.To) a contradiction with xq G B. Hence, if 
we use the set ^.( 7 ) defined above with v = g, we have To G A(j). From To G Leb(^ ac ) flLeb(^) 
we get p ac ( To) < g{xo), which is a contradiction. 

Case 2. p ac fxo) < £?(to). Exactly as in the previous case we have that To G spt(^) and, by the 
Lemma 5.1, we have again that (to,i/o) €= spt(y) implies y$ = xq. Indeed, otherwise To would 
be on the boundary of a ball where g = f a contradiction with to G {f? < Hence, we get 

To G ^( 7 ) and £>(to) < /r ac (To), again a contradiction. 

Hence we get that yC c = g for x G B. By the definition of B, 

B C C a .e. {g = /}U4( 7 ) C U4(7) C , 

where a.e. refers to the Lebesgue measure. By applying Lemma 5.1, this implies that g = f a.e. 
on B c , and concludes the proof of (5.3). 

Uniqueness of the projection it is now an immediate consequence of the saturation property 
(5.3). Indeed, suppose that go and £1 were two different projections of a same measure g. Define 
Q 1/2 = |f?o + \gi- Then, by convexity of Wf(-,/x), we get that gi / 2 is also optimal. But its 
density is not saturated on the set where the densities of £0 and g\ differ, in contradiction with 
(5.3). □ 

Corollary 5.3. For fixed, f, the map Pk s '■ ^(H) —> V-^A) defined through 

p K f [lA ■■= argmin{IU 2 2 (g, n) : g£l<f} 

is continuous in the following sense: if g n —>• g for the W 2 distance, then p K f [Hn ] —*• p K f [lA in 
the weak convergence. 

Moreover, in the case where f = 1 and ft is a convex set, the projection is also locally 
Holder continuous for W 2 on the whole p 2 {kX) and satisfies (5.2). 

Proof. This is just a matter of compactness and uniqueness. Indeed, take a sequence g n —» g 
and look at Pk s [p-n] ■ It is a tight sequence of measures since 

(5.5) W 2 (PK f \Pn\, p) < W 2 (Pk } \Pn\, Pn) + W 2 (pn, p) < W 2 (g, p) + 2 W 2 (p n , p) , 

where g G Kf is any admissible measure. Hence we can extract a weakly converging subsequence 
to some measure g G Kf (recall that Kf is weakly closed). Moreover, by the lower semicontinuity 
of W <2 with respect to the weak convergence and since W 2 (p n , p) 0 , passing to the limit in 
(5.5) we get 

W 2 (g,p)<W 2 (g,p) Vg G K f . 

Uniqueness of the projection implies g = Pk 7 ( p) and thus that the limit is independent on the 
extracted subsequence, this proves the desired continuity. 
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Concerning the second part of the statement, we take arbitrary g 1 and g 2 (not necessarily 
absolutely continuous) and we approximate them in the W 2 distance with absolutely continuous 
measures g l n (i = 1,2; for instance by convolution), then we have, from (5.2) 

WiiPKdgn], PKAah]) < Wi(g^,gl l ) + W 2 (gn, 9 n)(dist(g^,Ki) + dist(gl,Ki)), 
and we can pass to the limit as n —>• 00 . □ 

The following technical lemma will be used in the next section and establishes the continuity 
of the projection with respect to /. To state it let us consider, for given / € L\ oc and g G "^ 2(^)5 
the following functional 

jr , = U w 2(v,e), if £> G I\f 

[ + 00 , otherwise. 

Proposition 5.2 can be restated by saying that the functional J-j has a unique minimizer in 

v 2 (n). 

Lemma 5.4. Let f n , / G Lj oc (Q) with f n dx > 1, J n f dx > 1 and let us assume that /„—>■/ 
in L 1 ’ oc (Q) and almost everywhere. Also assume f n G V 2 (12) if J n f n dx = 1 and f G V 2 {LI) if 
f n f dx = 1. Then, for every g G V 2 {Ll), 

(i) The sequence (PK fn (g))n is tight. 

(ii) We have PK fn {g ) —^ Pi< f (g )• 

(iii) If f n f> 1, then Pf n T—converges to Tf with respect to the weak convergence of mea¬ 
sures. 

Proof. Let us denote by g n the projection Pk Sti ( g ) and let us start from proving its tightness, i.e. 
(i). We fix £ > 0: there exists a radius Rq such that g(B( 0, Rq)) > 1 — § and f > 1 — |. 

By L\ oc convergence, there exists no such that f B ^ Q Ro ^ f n > 1 — e pour n > no- Now, take 
R > 3R 0 and suppose g n (B(0, R) c ) > e for n > no- Then, the optimal transport T from g n to 
g should move some mass from B(0,R) c to B(0,Ro). Let us take a point xq G B(0,R) c such 
that T{x 0) G B(0,Ro). From Lemma 5.1, this means that g n = f n on the ball B{T{x 0), |xo — 
T(xo)|) D B(T(xq),2Rq) D B(0,R o ). But this means f B ( O Ro ) Qn = f B (o,R 0 )f n > 1 ~ £ i and 
hence g n (B(0, R) c ) < e, which is a contradiction. This shows that g n is tight. 

Now, if Jq f = 1, then the weak limit of g n (up to subsequences) can only be / itself, since 
it must be a probability density bounded from above by / and / = PK f (g). This proves (ii) in 
the case f Q f = 1. In the case f n f > 1, this will be a consequence of (iii). Notice that in this 
case we necessarily have f Q f n > 1 for n large enough. 

Let us prove (iii). Since g n < f n , g n —*• g and /„—>•/ in L\ oc immediately implies that g < /, 
the T—liminf inequality simply follows by the lower semicontinuity of W 2 . 

Concerning the T—limsup, we need to prove that every density g G 7^2(^) with g < / a.e. can 
be approximated by a sequence g n < f n a.e. with W 2 (g n ,g) W 2 (g,g). In order to do this let 
us define g n := inin{^, /„ }. Note that g n is not admissible since it is not a probability, because 
in general J g n < 1. Yet, we have f g n —>■ 1 since g n —> min{^, /} = g and this convergence 
is dominated by g. We want to “complete” g n so as to get a probability, stay admissible, and 
converge to g in W 2 , since this will imply that W 2 (g n ,g) W 2 (g,g). 

Let us select a ball B such that J Bfl Q f > 1 and note that we can find e > 0 such that the set 
{/>£* + £ } FI B is of positive measure, i.e. m := \{f > g + e} D B\ > 0. Since f n —> f a.e., the 
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set B n := {f n > g + |} n B has measure larger than m/2 for large n. Now take B' n C B n with 
\B' n \ = f (1 — f Qn ) —> 0, and define 

Qn : Qn 


By construction, f g n = 1 and g n < f n a.e. since on B' n we have g n = Q and g + § < f n while 
on the complement of B' n , g n < f n a.e. by definition. To conclude the proof we only need to 
check W 2 (g n , g) —> 0. This is equivalent (see, for instance, [2] or [25]) to 


(5.6) 




for all continuous functions 4> with such that cj) < C(1 + \x\ 2 ). Since g € T 2 {TI) and g n < g , 
thank to the dominated convergence theorem it is enough to show that f 4>(g n — g n ) —> 0. But 
Qn — g n converges to 0 in L 1 and it is supported in B' n C B. Since cj) is bounded on B we obtain 
the desired conclusion. □ 


Remark 5.1. Let us conclude this section with the following open question : for / = 1 the 
projection is continuous and we can even provide Holder bounds on Pk 1 ■ The question whether 
Pk 1 is 1-Lipschitz, as far as we know, is open. Let us underline that some sort of 1-Lipschitz 
results have been proven in [6] for solutions of similar variational problems, but seem impossible 
to adapt in this framework. 

For the case f ^ 1 even the continuity of the projection with respect to the Wasserstein 
distance seems delicate. 


5.2. BV estimates for Pk s • In this section, we prove Theorem 1.2. Notice that the case / = 1 
has already been proven as a particular case of Theorem 1.1. To handle the general case, we 
develop a slightly different strategy, based on the standard idea to approximate L°° bounds with 
IP penalizations. 

Let m € N and let us assume that inf / > 0, for g £ 7^2 (H), we define the approximating 
functionals T m '■ -^+(^1) ->KU {+oo} by 

+ dx+e fi{l) dX 

and the limit functional T as 

, = | l 2 W 2(P; o), if e € Kf 

[ Too, otherwise 

Here e m j, 0 is a small parameter to be chosen later. 


Lemma 5.5. Let Ll C M. d and / : Q —> (0,Too) be a measurable function, bounded from below 
and from above by positive constants and let g € VoiLl)- Then: 

(i) There are unique minimizers g, g m in L l (Ll) for each of the functionals T and T m , 
respectively. 

(ii) The family of functionals T m T-converges for the weak convergence of probability mea¬ 
sures to T, and the minimizers g m weakly converge to g, as m —> oo. 



BY ESTIMATES IN OPTIMAL TRANSPORT 


21 


(iii) The minimizers g m of T m satisfy 

(S.7) v ” + (t) 7 + £m ( t)7 = °’ 

for a suitable Kantorovich potential <p m in the transport from g m to g. 

Proof. Existence and uniqueness of minimizers of T has been established in Proposition 5.2. 
Existence of minimizers of J- m is again a simple application of the direct methods in the calculus 
of variations and uniqueness follows from strict convexity. 

Let us prove the T—convergence in (ii). In order to prove the T—liminf inequality, let g m g. 
If F m {g m ) < C, then for every viq < rn and every finite measure set A C fl, we have 

\\8m/fh™o(A) < |A|™0 _ ™+i(C(m + 1))™+1. 

1 

If we pass to the limit m -A oo, from ^y- —^ j, we get \\g/ f\\L m o (A) < |A| m o. Letting mo go to 
infinity we obtain ||£»//||l°° < 1, he. g £ Kf. Since 

— 2^2 (A 4 ) Qm)i 

the lower semicontinuity of VEj with respect to weak converges proves the T—liminf inequality. 

In order to prove T—limsup, we use the constant sequence g m = g as a recovery sequence. 
Since we can assume g < f (otherwise there is nothing to prove, since F{g) = +oo), it is clear 
that the second and third parts of the functional tend to 0, thus proving the desired inequality. 

The last part of the statement finally follows from Theorem 2.1 (vi) and Lemma 2.5, exactly 
as in Corollary 2.6. □ 

Proof of Theorem 1.2 


Proof. Clearly we can assume that TV(g,£l) and TV(f, fi) are finite and that Jq f > 1 since 
otherwise the conclusion is trivial. 


Step 1. Assume that the support of g is compact, that / E C 00 (f2) is bounded from above and 
below by positive constants, and let g m be the minimizer of T m . As in the proof of Theorem 
1.1, we can use the optimality condition (5.7) to prove that g is compactly supported. Also, the 
same condition imply that g is Lipschitz continuous. Indeed, we can write (5.7) as 


Tf + H'm (^J ^ — 0 ) 

where H m (t) = t 2 . Since H m is smooth and convex and H'f^ is bounded from 

below by a positive constant H' m is invertible and 

e = f-(H' m )- 1 (-< P f), 

where is Lipschitz continuous. Since ip and / are locally Lipschitz, this gives Lipschitz 

continuity for g on a neighborhood of its support. 

Taking the derivative of the optimality condition (5.7) we obtain 



/V£> m - g m Vf 

P 




VVm + 


6m 

T 


m 


m 
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Rearranging the terms we have 

\7ifm + A\7g m - BVf = 0, 

where by A and B we denote the (positive!) functions 


A := m 


Qm 

T 


m— 1 


+ £ri 


1 


and B := I m 


Qm 

T 


m— 1 


+£?; 


Qm 


+ 


,p \ \ f J J P 

Now we will use the inequality from Corollary 3.4 for g m and g in the form 


Qm 

T 


+ £r 


Qm \ 1 


f J f 


2 ' 


IV£> m | dx < / \Vg\dx+ / V g n 
J Pi J Pi 


V Qn 


+ 




dx. 


,|v e m I IW> r 

In order to estimate the second integral on the right-hand side we use the inequality 


(5.8) 


a b 

< 

a 6 

+ 

6 

6 

|a| 6 


a | a| 

a | 

" W\ 


la — 61 161 — lal 2 

+ , I < T-r a - b l 

a a a 


for all non-zero a, 6 G W l (that we apply to a = TV g m and 6 = — V(p m ), and we obtain 


[ | V| dx < [ | V< 7 1 dx + / | Vg r 

J PI J PI J Pi 


A V g m V ( p r , 

m I |Vy> 


dx 


< / \Vg\ dx + 2 / — \AVg m + V<^ m | dx 

7f2 Jn A 

< [ |V 5 |dx + 2 [ ^|V/|dx. 

.70 .70 A 


We must now estimate the ratio B/A. If we denote by A the ratio g m /f we may write 
^ =A + A——- < A I 1 4-14-^ 

Now, consider that 


£ m + m\ r 


m 


£ m A 


m — 2 


max 


£ m + rn \ m - 1 
l/Cm-i) 


=: 


AeR+ e m + mA™' 1 m — 1 \m(m — 2) 
is a quantity depending on m and tending to 0 if £ m is chosen small enough (for instance 


|Vfj m | dx < / | Vg| dx 
Jn 


+ 2 1 + 


1 


Qm 


mj Jn f 


IV/1 dx + 2<5 m / |V/| dx. 


In the limit, asm4 +oo, we obtain 


/ |Vfj| dx < / |Vg| dx + 2 / -§|V/|dx. 
Jn Jn J pi J 

Using the fact that g < f, we get 

[ |V^| dx < [ |Vg| dx + 2 f |V/|dx. 


Step 2. To treat the case g, f € BVi oc (£l) we proceed by approximation as in the proof of 
Theorem 1.1. To do this we just note that Corollary 5.3 and Lemma 5.4 give the desired 





















BY ESTIMATES IN OPTIMAL TRANSPORT 


23 


continuity property of the projection with respect both to g and /, lower semicontinuity of the 
total variation with respect to the weak convergence then implies the conclusion. □ 

Remark 5.2. We conclude this section by underlining that the constant 2 in inequality (1.4) 
can not be replaced by any smaller constant. Indeed if fl = R, / = 1 r + , g = ~l[- n ,o] then 
e = P Kf (g) = 1[0,1] and / |Ve| = 2, f |V/| = 1, / |V 5 | = l 

6. Applications 

In this section we discuss some applications of Theorems 1.1 and 1.2 and we present some 
open problems. 

6.1. Partial transport. The projection problem on Kf is a particular case of the so called 
partial transport problem , see [12, 13]. Indeed, the problem is to transport g to a part of the 
measure /, which is a measure with mass larger than 1. As typical in the partial transport 
problem, the solution has an active region, which is given by / restricted to a certain set. This 
set satisfies a sort of interior ball condition, with a radius depending on the distance between 
each point and its image. In the partial transport case some regularity (C 1,a ) is known for the 
optimal map away from the intersection of the supports of the two measures. 

A natural question is how to apply the technique that we developed here in the framework 
of more general partial transport problems (in general, both measures could have mass larger 
than 1 and could be transported only partially), and/or whether results or ideas from partial 
transport could be translated into the regularity of the free boundary in the projection. 

6.2. Shape optimization. If we take a set A C with \A\ < 1 and finite second moment 
f A \x\ 2 dx < Too, a natural question is which is the set B with volume 1 such that the uniform 
probability density on B is closest to that on A. This means solving a shape optimization 
problem of the form 

min{W 2 2 (l B ,|^-l^) : \B\ = 1}. 

The considerations in Section 5.1 show that solving such a problem is equivalent to solving 

min{W 2 2 (e, i^j 1 a) : 2 (K d )} 

and that the optimal g is of the form g = 1 B , B D A. Also, from our Theorem 1.2 (with / = 1), 
we deduce that if A is of hnite perimeter, then the same is true for B, and Per (B) < pqPer(A) 
(i.e. the perimeter is bounded by the Cheeger ratio of A). 

It is interesting to compare this problem with this perimeter bound with the problem studied 
in [19], which has the same words but in different order: more precisely: here we minimize the 
Wasserstein distance and we try to get an information on the perimeter, in [19] the functional 
to be minimized is a combination of Wi and the perimeter. Hence, the techniques to prove any 
kind of results are different, because here Wjj cannot be considered as a lower order perturbation 
of the perimeter. 

As a consequence, many natural questions arise: if A is a nice closed set, can we say that B 
contains A in its interior? if A is convex is B convex? what about the regularity of dB? 
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6.3. Set evolution problems. Consider the following problem. For a given set A C M. d we 
define go = 1a- For a time interval [0, T ] and a time step t > 0 (and N + 1 := [-F]) we consider 
the following scheme Qq := go and 

(6.1) el+i ■= Pkx [(1 + t)qI\ , k <E {0,..., N - 1}, 

(here we extend the notion of Wasserstein distance and projection to measures with the same 
mass, even if different from 1: in particular, the mass of gT will be |A|(1 + r) k and at every step 
we project g], on the set of finite positive measure, with the same mass of gj.. and with density 
bounded by 1, and we still denote this set by I\\ and the projection operator in the sense of the 
quadratic Wasserstein distance onto this set by Prx)- We want to study the convergence of this 
algorithm as r —> 0. This is a very simplified model for the growth of a biological population, 
which increases exponentially in size (supposing that there is enough food: see [17] for a more 
sophisticated model) but is subject to a density constraint because each individual needs a 
certain amount of space. Notice that this scheme formally follows the same evolution as in the 
Hele-Shaw flow (this can be justified by the fact that, close to uniform density the W 2 distance 
and the H -1 distance are asymptotically the same). 

Independently of the compactness arguments that we need to prove the convergence of the 
scheme, we notice that, for fixed r > 0, all the densities gj, are indeed indicator functions (this 
comes from the consideration in Section 5.1). Thus we have an evolution of sets. A natural 
question is whether this stays true when we pass to the limit as r -> 0. Indeed, we generally 
prove convergence of the scheme in the weak sense of measures, and it is well-known that, 
in general, a weak limit of indicator functions is not necessarily an indicator itself. However 
Theorem 1.2 provides an a priori bound the perimeter of these sets. This BV bound allows to 
transform weak convergence as measures into strong L 1 convergence, and to preserve the fact 
that these densities are indicator functions. 

Notice on the other hand that the same result could not be applied in the case where the 
projection was performed onto Kf , for a non-constant /. The reason lies in the term 2 f |V/| 
in the estimate we provided. This means that, a priori, instead of being decreasing, the total 
variation could increase at each step of a fixed amount 2 j |V/|. When r —0, the number of 
iterations diverges and this does not allow to prove any BV estimate on the solution. Yet, a 
natural question would be to prove that the set evolution is well-defined as well, using maybe 
the fact that these sets are increasing in time. 

6.4. Crowd movement with diffusion. In [16, 22] crowd movement models where a density g 
evolves according to a given vector field v, but subject to a density constraint g < 1 are studied. 
This means that, without the density constraint, the equation would be dtg + V • ( gv ) = 0, and 
a natural way to discretize the constrained equation would be to set g T k+ i = (id + rv)#g k and 
then el +l = P Kl [Q T k+ J. 

What happens if we want to add some diffusion, i.e. if the continuity equation is replaced 
by a Fokker-Planck equation dtg — A g + V • (gv) = 0? among other possible methods, one 
discretization idea is the following: define Q T k+ \ by following the unconstrained Fokker-Planck 
equation for time r starting from gT, and then project. In order to get some compactness of the 
discrete curves we need to estimate the distance between g T k and g k+l - It is not difficult to see 
that the speed of the solution of the Heat Equation (and also of the Fokker-Planck equation) for 
the distance W p is related to ||V^||i,p. It is well known that these parabolic equations regularize 
and so the L p norm of the gradient will not blow up in time, but we have to keep into account 
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the projections that we perform every time step r. From the discontinuities that appear in 
the projected measures, one cannot expected that W 1,p bounds on g are preserved. The only 
reasonable bound is for p = 1, i.e. a BV bound, which is exactly what is provided in this paper. 

The application to crowd motion with diffusion has been studied by the second and third 
author in [20]. 


6.5. BV estimates for some degenerate diffusion equation. In this subsection we apply 
our main Theorem 1.1 to establish BV estimates for for some degenerate diffusion equation. 
BV estimates for these equations are usually known and they can be derived by looking at 
the evolution in time of the BV norm of the solution. Theorem 1.1 allows to give an optimal 
transport proof of these estimates. Let h : M + -^Kbea given super-linear convex function and 
let us consider the problem 


d t Qt = V • (h”( et )p t Vp t ), in (0, T} x R d , 
p(0, •) = Po, in 


where £o is a non-negative BV probability density. We remark that by the evolution for any 
t £ (0, T] Qt will remain a non-negative probability density. In the case h(p) = p m /{m — 1) in 
equation (6.2) we get precisely the porous medium equation d t p = A (p m ) (see [26]). 

Since the seminal work of F. Otto ([21]) we know that the problem (6.2) can be seen as a 
gradient flow of the functional 

He) ■= [ Kg) 

JR d 

in the space (V(M. d ), W^)- As a gradient flow, this equation can be discretized in time through 
an implicit Euler scheme. More precisely let us take a time step t > 0 and let us consider the 
following scheme: Qq '■= Qq and 


(6.3) 


Qk+\ '■= argmin^ 


(f?j el) + 



k £ {0,..., N - 1}. 


where N := [^]. Using piecewise constant and geodesic interpolations between the g^’s with 
the corresponding velocities and momentums, it is possible to show that as r —> 0 we will get a 
curve Qt, t £ [0, T] in ( V(M. d ),W 2 ) which solves 


dtPt + V • (g t v t ) = 0 
vt = -ti'(g t )\7g t , 


hence 

d t Qt ~ V • {h"(g t )QtS7Qt) = 0, 

that is Qt is a solution to (6.2), see [2] for a rigorous presentation of these facts. 
We now note that Theorem 1.1 implies that 

dx< [ |V Q r k \ dx, 

JR d 



hence the total variation decreases for the sequence Qq, ■ ■ ■ ,Q T N - As the estimations do not depend 
on t > 0 this will remain true also in the limit t —>■ 0. Hence (assuming uniqueness for the limiting 
equation) we get that for any t,s £ [0, T], t > s 

TV(g t ,R d )<TV(Q s ,R d ), 
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and in particular for any t € [0, T] 

TV( 6t ,R d ) <TV( e 0 ,M d ). 
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